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DESCRIPTION 

^RMATIONPROCESSmGDEVICE^INFORM AT^^ 

jlETHOD^PROGRA^^ 

Technical Field 

The present invention relates to an information processing device for 
editing a multiplexed stream consisting of video and audio streams wim video 
frame accuracy and seamlessly reproducing the edit point, its method, a program, 
and recording medium and, an information processing device that generates a 
multiplexed stream in the optimum form for seamless reproduction and a 
recording medium that stores the multiplexed stream data. 

The present application claims priority from Japanese Patent Application 
No. 2003-130661 fded on May 8, 2003, the content of which is hereby 
incorporated by reference into this application. 

Background Art 

A method for editing a multiplexed stream of video and audio streams 
with video frame accuracy and seamlessly reproducing edit points is described in 
0 Jpn. Pat. Appht. Laid^pen Publication No. 2000-175152, Jpn. Pat. Appln. Laid- 
Open Publication No. 2001-5441 18, and Jpn. Pat. Appln. Laid^pen Publication 
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No. 2002-158974. 

HG. 1 is a block diagram showing a conventional DVR-STD model 
(DVR MPEG2 transport stream player model) (hereinafter, referred to as 
"player") 101. The DVR-STD is a conceptual model for modeling decode 
5 processing in generating and examining an AV stream that is referred to by 
seamlessly connected two Playltems. 

As shown in HQ. 1, in the player 101, a TS (Transport Stream) file read 
out from a-readout^section (DVR drive) 111 at a bit rate Rud is buffered m a read 
■ buffer 112. From the read buffer 112, a source packet is read out to a source 
L 0 depacketizer 1 13 at a maximum bit rate Rmax- 

A pulse oscillator (27 MHz X-tal) 1 14 generates a 27 MHz pulse. An 
arrival time clock counter 115 is a binary counter that counts the 27 MHz 
frequency pulse and supplies the source depacketizer 1 13 with a count value 
Arrival_time_clock (i) of the arrival time clock counter at a time t (i). 
15 One source packet has one transport packet and its arrival_time_stamp. 

When the arrival_time_stamp of the current source packet is equal to the value 
of LSB (Least Significant Bit: 30 bit) of the arrival_time_clock (i), a fransport 
packet of the current source packet is output from the source depacketizer 113. 
TS_recording_rate is a bit rate of a fransport sfream (hereinafter referred to as 
20 "TS"). Notations of n,TBn,MBn,EBn,TBsys,Bsys,Rxn,Rbxn,Rxsys,Dn, 
Dsys. On, and Pn (k) shown in HG. 1 are the same as those defined in T-STD 



(transport stream system target decoder specified by ISO/EC 13818-1) of 
ISO/IEC 13818-1 (MPEG2 systems specification) . 

Decoding process in the above conventional player 101 will next be 
described. Firstly, the decoding process during reproduction of a single DVR 
MPEG2 TS wm be described. During reproduction of a single DVR MPEG2 
TS, the timing at which a transport packet is output from an output section 1 10 
so as to be input to TBI, TBn or TBsys of the DVR-STD, which is a decoder 
120, is determined by arrival_time_stamp of the source packet. Specification 
related to buffering operations of TBI, MBl, EBl, TBn, Bn, TBsys and TBsys 
is the same as in the case of the T-STD specified by ISO/IEC 13818-1. 
Specification related to decoding and presentation operations is also the same as 
in the case of the T-STD specified by ISO/IEC 13818-1. 

Next, decoding process during reproduction of seamlessly connected 
Playltems will be described. Here, reproduction of a previous stream TSl and 
a current stream TS2 that are referred to by the seamlessly connected Playltems 
will be described. 

During the shift between a certain AV stream (TSl) and the next AV 
stream (TS2) seamlessly connected to the AV stream (TSl), the time axis of TS2 
arrival time base is not the same as that of TSl arrival time base. Further, the 
time axis of TS2 system time base is not the same as that of TSl system time 
base. The presentation ofvideo images needs to be continued seamlessly. An 



overlap may exist in the presentation time of audio presentation unit. 

Next, an input timing of the transport packet read out from the source 
depacketizer to the DVR-STD wiU be described. 

(1) Before time Tl at which the input of the last video packet of TSl to 
TBI of the DVR-STD has been completed 

Before time Tl, input timing to buffer TBI, TBn or TBsys of the DVR- 
STD is determined by arrival_time_stamp of the source packet of TSl. 

(2) From time Tl to time T2 at which the input of the last byte of 
remaining packets of TSl has been completed 

Remammg packets of TSl must be input to the buffer TBn or TBsys of 
the DVR-STD at a bit rate (maximum bit rate of TS 1) of TS_recording_rate 
(TSl). TS_recording_rate (TSl) is a value of TS_recording_rate defined by 
ClipInfoO corresponding to Clip 1. The time at which the last byte of TSl is 
input to the buffer is time T2. Therefore, from time Tl to time T2, 
arrival_time_stamp of the source packet is ignored. 

Assuming that Ni is the number of bytes of the transport packet of TSl 
that follows the last video packet of TSl, the time between Tl and T2 (time T2 - 1 
= T2 - Tl) is the time required to complete the input ofNi byte at a bit rate of 
TS_recording_rate (TSl), and is represented by the following equation (1). 

) 

T2 - 1 = T2 -Tl = Ni/TS_recording_rate (TSl) • • • (1) 



From time Tl to time T2, values of Rxn and Rxsys shown in HG. 1 are 
changed to the value of TS_recording_mte (TSl). Except for the above rule, 
buffering operation is the same as that of the T-STD. 
5 Since values of Rxn and Rxsys shown in HG. 1 are changed to the value 

of TS_recording_rate (TSl) between time Tl and T2, additional buffer amount 
(data amount corresponding to about 1 second) is required in addition to the 
buffer amount defmed by the T^TD so that an audio decoder can process the 
input data between time Tl and T2. 
10 (3) After time T2 

At time T2, the arrival time clock counter 1 15 is reset to the value of 
arrival_time_stamp of the f«st source packet of TS2. The input timing to the 
buffer TBI, TBn or TBsys of the DVR-STD is determined by 
arrivaLdme.stamp of the source packet of TS2. Rxn and Rxsys are changed to 

15 the value defined by T-STD. 

Next, video presentation timing will be described. A video presentation 
unit must be presented seamlessly through its connection point. 

Here, it is assumed that 
STC (System Time Clock) 1: time axis of TSl system time base 
20 STC2: time axis of TS2 system time base (correctly, STC2 starts from the time 
at which the first PGR (Program Clock Reference) of TS2 is input to the T-STD). 



An offset value between STCl and STC2 is deternuned as foUows. 
Assuming that 

PTSUn,: PTS on STCl corresponding to the last video presentation unit 

ofTSl 

FrS2,„: PTS on STC2 corresponding to the first video presentation unit 
ofTS2 

Tpp: presentation period of the last video presentation unit, offset value 
STC.deltabetwecntwo system time bases is represented by the following 

equation (2). 

STC_delta-PTSle„d + Tpp-PTS2„a„- • • (2) 

Next, audio presentation timing will be described. An overlap of the 
presentation timing of the audio presentation unit may exist at the connection 
point of TSl and TS2, the overlap being ftom 0 to less than 2 audio frames. 
The player 101 must select one of the audio samples and re^ynchronize the 

presentation of the audio presentation unit with the corrected time base after the 

connection point. 

■ae processing for control of system time clock of the DVR-STD carried 
0 outby meplayer 101 when me time shifts from TSl to TS2 seamlessly 
connected to TSl will be described. At time T5 when the last audio 



presentation unit of TSl is presented, the system time clocks may be overlapped 
l,etweenUmeT2andT5. Between time T2 and T5, the DVR-STD switches 
the system time clock from the value (STCl) of the old time base to the value 

(STC2) of the new time base. Tlte value of STC2 can be represented by me 

following equation (3). 

STC2 = STCl - STC_delta • • • (3) 

An encoding condition that TSl and TS2 must meet when the time shifts 
from TSl to TS2 seamlessly connected to TSl wiU be described. 
It is assumed that 

STCl valueof STCon system time base STCl when the last 

byte ofthe last video packet of TSl reaches TBI of theDVR-STD 

STC2\.„ W value of STC on system time base STC2 when the ftrs, 
byte of the first video packet of TS2 reaches TB 1 of the DVR-STD 

STC2' video e«.; value obtained by converting the value of STCl',ideo_erf to 
the value on system time base STC2. 

In this case, STC2'.ideo_e.d is represented by the following equation (4). 

STC2'.d»_.,. = STCl'^deo..^- STC_delta • ■ ■ (4) 



It is necessary to meet the following two conditions in order for the 
decoder 120 to conform to the DVR-STD. 
(Condition 1) 

The timing at which the first video packet of TS2 reaches TB 1 must meet 
5 the following inequality (5). 



STC2% 



,>STC2^ideo_end + T2-l•••(5) 



The partial Streams Of Clip 1 and/or Clip 2 need to be re^ncoded and/or 
10 re-multiplexed in order to meet the above inequality (5). 
(Condition 2) 

On the time axis of the system time base obtained by converting STCl 
and STC2 to the same time axis as each other, inputs of the video packet from 
TSl and subsequent inputs of the video packet from TS2 should not overflow 
15 and underflow the video buffer. 

However, as described above, the conventional player 101 using the 
DVR-STD model can process input data between time Tl and T2. That is, 
since the remaining packets of TSl are input to the buffer TBn or TBsys of the 
DVR-STD at a bit rate (maximum bit rate of TSl) of TS_recordmg_rate (TSl) 
20 betweentimeTl and T2, additional buffer having the capacity capable of 

buffering data amount corresponding to about 1 second is required in addition to 
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the buffer amount defined by the T-STD. 

This buffer capacity is based on the following factor. That is, among 
MPEG2 TSes, the audio data reproduced in synchronization with the video data 
corresponding to a certain byte position can exist apart from the multiplexed 
S phase difference within a predetermined region, and the maximum value of this 
multiplexed phase difference is equal to the data amount corresponding to 1 
second. Therefore, the maximum value of Nl of the above equation (1) is 
equal to the audio data corresponding to up to 1 second. Between time Tl and 
T2, axrival_time_stamp of the source packet is ignored and the source packet 
10 correspondingtothedataamountofNl is input to the audio buffer at the 

maximumbitrate of TS. Therefore, additional buffer amount (data amount 
corresponding to about 1 second) is required in addition to the buffer amount 
defined by the T-STD. 

The volume of this additional buffer can be calculated as foUows. That 
15 is, in the case of the audio stream that has been encoded according to Dolby AC- 
3 at, e.g., 640 kbps, the audio data corresponding to 1 second is 80 kBytes (= 
640 kbits). As a result, the additional buffer of 80 kBytes is required. 

In the case of the audio stream (24 bit sample, 96 KHz sampling 
frequency, 8 channels) that has been encoded accorfing to Linear PCM method, 
20 the audio data corresponding to 1 second is about 18 Mbits (= 24 bit sample X 
96,000 samples/sec X8channels). As a result, Ihe additional buffer of about 3 



Mbytes is required. Thus, in the case where the above multi-charmel audio data 
is employed, the size of the additional buffer becomes extremely large. 

Disclosure of the Invention 

The present invention has been proposed in view of the conventional 
simation, and an object thereof is to provide an information processing device 
including an audio buffer having an optimum capacity for realizing seamless 
decoding of two multiplexed streams in each of which an audio stream and 
video stream are multiplexed, its method, a program and a recording medium, 
and an information processing device that generates a multiplexed stream 
corresponding to the audio buffer capacity, its method, and recording medium 
that records the multiplexed stream. 

To attain the above object, an information processing device according to 
the present invention that decodes a multiplexed stream which includes a data 
stream constimted by a plurality of source packets each having a transport packet 
and its arrival time stamp, and in Which a second picwre, which is the first 
picture of a second multiplexed stream, is comtected to a first picWre, which is 
the last picture of a first multiplexed stream so as to be reproduced seamlessly, 
comprises: an output means for outputting the source packets according to the 
0 arrival time stamp of the multiplexed stream; a video buffer for buffering video 
data included in the source packets; an audio buffer for buffering audio data 



included in the source packets; a video decoding means for decoding the video 
data buffered in the video buffer; and an audio decoding means for decoding the 
audio data buffered in the audio buffer, the audio buffer having a capacity 
capable of buffering the audio data corresponding to the time required for 
inputting the second picture to the video buffer. 

In the present invention, the audio buffer has a capacity capable of 
buffering the audio data corresponding to the time required for inputting the 
second picture to the video buffer, and the source packet is input to the buffer 
according to arrival_time_stamp of the source packet in the multiplexed stream 
even between the time at v.hich the input of the first pictare to the video buffer is 
completed and the time at which the input of the last source packet of the fust 
multiplexed stream is completed. This elimmates the additional buffer 
corresponding to 1 second that has been conventionally required for inputting 
the transport packet at the maximum bit rate of TS with arrivaLtime_stamp of 
the source packet ignored. Further, it is possible to input the picture to be 
decoded first in the second multiplexed stream to the video buffer by the 
decoding timing thereof after the last transport packet of the first multiplexed 
stteam has been input. 

Further, Ebn_max ■= a_max«v) « Ra can be satisfied where EBn_max 
: 0 (bits) is a capacity required for the audio buffer; I.max (bits) is a bit amount of 
flre second picture, Rv (bps) is an input bit rate to the video buffer, and Ra (bps) 



is abit rate of audio data, assunung that I_max is a bit amount of, for example, I 
picture, that is the second picture, capacity of audio buffer can be set to up to 

EBn_max = a_ma''*v) " 

It is preferable that the audio buffer has a capacity capable of buffering 
a,e audio data corresponding to 100 milliseconds. Since the data amount of I 
picture according to MPEG2 specification is 10% or less of the data amount 
transmitted in 1 second in general, it is possible to input I picture to the video 
buffer by the decode timing thereof by setting the capacity of the audio buffer to 
the size corresponding to 100 milliseconds to allow the audio buffer to move 
ahead the audio data by *at amount. As a result, encoding restriction on video 
data is reduced. That is, by settmg the capacity of the audio buffer to the above 
size, it is possible to form a multiplexed stteam so that the input of the audio data 
is completed 100 milliseconds earlier than the reproduction timing thereof. 
Further, assuming that STC_delta is a time difference between 
5 presentation end time of the Brsi picture on the time axis of the first multiplexed 
stream and presentation start time of the second picmre on the time axis of the 
second multiplexed stream, STC2'.^ (= STCl'e^ - STC_delta) is a value 
obtained by converting STCl 'e^, which is the value on the time axis of the first 
multiplexed stream at which the last byte of the last source packet of the first 
2 0 multiplexed stream is output ftom the output means, into the value on the time 
axis ofthe second multiplexed stream using the time difference STC_delta, and 



STC2'»„ is the value on the time axis of the second multiplexed stream at which 
the first byte of the first source packet of the second multiplexed stream is output 
from the output means, the multiplexed stream satisfies STC2\^ > STC2'™,, 
thereby conforming to the DVR-STD. 

Further, it is preferable to configure the information processing device 
such that afteralapseofapredetermined time deltal after the last source packet 

of the first multiplexed stream has been output fix,m the output means, the first 
source packet of the second multiplexed stream is output from the output means. 
In this case, STC2^»„ > STC2'e^ is satisfied. As a result, determination of the 
input timmg of the first source packet of the second multiplexed stream becomes 
more flexible, which makes it easy to encode the second multiplexed stream. 

Further, assuming that STC_delta is a time difference between 
presentation end time of the first picture on the time axis of the f«st multiplexed 
stream and presentation start time of the second picture on the time axis of the 
second multiplexed stream, it is possible to configure the mformation processing 
device such that after a lapse of a predetermined time ATC_delta after the output 
of the last source packet of the first multiplexed stream has been started, the fnrst 
source packet of the second multiplexed stream is output from the output means, 
the predetermined time ATC_delta is so determined as to satisfy the tune 
0 difference STC_delta, and the multiplexed stream is so formed as to satisfy the 
time difference STC_delta. As a result, determination of the input timing of the 



first source packet of the second multiplexed stream becomes more flexible, 
which makes it easy to encode the second multiplexed stream. 

In this case, it is possible to manage the predetermined time ATC_delta 
as attachment information of the first multiplexed stream. 

An information processing method according to the present invention that 
decodes a muWplexed stream which includes a data stream constimted by a 
plurality of source packets each having a transport packet and its arrival time 
stamp, and in which a second picwre, which is the first pictare of a second 
multiplexed stream, is comtected to a first picmre, which is the last picture of a 
first multiplexed stream so as to be reproduced seamlessly, comprises: a step of 
outputting the source packets according to the arrival tune stamp of the 
multiplexed stream; a step of buffering video and audio data included in the 
source packets in video and audio buffers, respectively; and a step of decoding 
the video and audio data buffered in the video and audio buffers, wherein, in the 
buffering step, the audio data corresponding to the time required for inputting the 
second picmre to the video buffer is buffered in the audio buffer before the 
second picture is buffered in the video buffer. 

A program according to the presem invention allows a computer to 
execute the aforementioned information processing. A recording medium 
0 accordingtothepresentinventionisacomputer^eadablerecordingmediumthat 

records the program. 



Another recording medium according to the present invention records a 
multiplexed stream which includes a data stream constituted by a plurality of 
source packets each having a transport packet and its arrival time stamp, wherein 
the multiplexed stream is formed such that a second picture, which is the first 
picwre of a second multiplexed stream, is connected to a first picture, which is 
the last pictureotafirst multiplexed stream so as to be reproduced seamlessly, 

fte first and second multiplexed stream can be input to a decoder based on their 
respecdve arrival time stamps, and the input of the audio data conesponding to 
the time required for inputting the second picture to the decoder can be 
completed by the time at which the input of the second picture to the decoder is 

Started. 

In the present invention, a multiplexed stream is formed such that the 
input of the audio data corresponding to *e time required for inputting the 
second picture to the decoder is completed by the time at which the input of the 
second picture tothe decoder is started. As a result, it is possible to mput the 
picmre decoded first m the second multiplexed stream to the video buffer by the 
decode timmg thereof after the last transport packet of the first multiplexed 
stream is input by decoding the multiplexed stream, using the decoder including 
an audio buffer having a capacity capable of buffering the audio data 
, 0 corresponding to the time required for inputting the second pic««e to tite video 
buffer. 



Another information processing device according to the present invention 
that generates a multiplexed stream which includes a data stream constituted by 
a plurality of source packets each having a transport packet and its arrival time 
stamp, and which is read out and decoded by a decoder based on the arrival time 
stamp, comprises: a video encoding means for generating a first video encoding 
stream to end the presentation with a first picture and a second video encoding 
stream that starts the presentation with a second picture to be presented 
immediately after the first picture; and a multiplexing means for multiplexing 
the first video encoding stream and an audio encodmg stream synchronized with 
the first video encoding stream to generate a first multiplexed stream, 
multiplexing the second video encoding stream and an audio encoding stream 
synchronized with the second video encoding stream to generate a second 
multiplexed stream, and generating a multiplexed stream in which a second 
picture, which is the first picture of a second multiplexed stream, is connected to 
a first picture, which is the last picture of a first multiplexed stream so as to be 
reproduced seamlessly, wherein the multiplexing means multiplexes such that 
the input of the audio data corresponding to the time required for inputting the 
second picture to the decoder can be completed by the time at which the input of 
the second picture to the decoder is started. 

In the present invention, multiplexmg are performed such that the input of 
the audio data corresponding to e.g., 100 milliseconds, which corresponds to the 



toe required tor inputting the second picture to the decoder, is completed by the 
time at which the input of the second picture to the decoder is started. As a 
result, in the decoder, audio data is moved ahead to the audio buffer to 
sufficiendy assure the time to transmit .he second picwre, such as I picmre, by 
s the decode timing thereof, wh.ch makes i. easy to encode the multiplexed stream. 
Another information method according to the present invention that 
generates a multiplexed stream which includes a data stream constimted by a 
plurality of source packets each having a transport packet and its arrival time 
stamp, and which is read out and decoded by a decoder based on the arrival time 
. 0 stamp, comprises: a step of generating a first video encoding stream to end fte 
presentation witi. a first picture and a second video encoding stream titat starts 
fte presentation with a second picture to be presented immediately after the ftrst 
picture; and a step of multiplexing the fu.t video encoding stieam and an audio 
encoding stieam synchronized with the fu.t video encoding stream to generate a 
1 s fust multiplexed stieam, maltiplexing d.e second video encoding stieam and an 
audio encoding stream synchronized with the second video encoding stream to 
genemte a second multiplexed stream, and generating a multiplexed stieam in 
which a second picture, which is ti,e f^rst picti,re of a second multiplexed stream, 
is connected to a first picture, which is tire last picture of a first multiplexed 
20 stream so as to be reproduced seamlessly, wherein multiplexing is performed 
such that the inputof the audio data corresponding to the time required for 



i„p„«ing*e second picture to *edecoder can be completed by*e time at 

which fte input of the second picmre to the decoder is started. 

The above and other objects, advantages and features of me present 
invention wiU be more apparent from the following description taken in 
conjunction with the accompanying drawings. 

Brief Description of the Drawings 

HG. 1 is ablock diagram showing a conventional information processing 

device; 

HG. 2 is a view showing a model of the relation between a previous 
Playltem and a current Playltem in the case of using Bridge-Clip; 

FIG. 3 is a view showing a model of the relation between a previous 
Playltem and a current Playltem in the case of no. using Bridge<:iip; 

HG. 4 is a view showing, in the presentation order of pictures, models of 
. Clipl andClip2.whicharevideostreamstobeseamlesslyconnectedtoeach 

Other; 

HGS. 5(a), 5(b) and 5(c) show an example of a data stream in each AV 
stream in which the video streams (Clip 1 and Clip 2) shown in HG. 4 are 
seamlessly connected using BridgeSequence (fttst method); 

FIG. 6 shows an example of a data stream in each AV stream in which 
the video streams(C,ipland Clip 2) shown innG.4are seamlessly connected 
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by using Ihe second method that does not use BridgeSequence; 

FIG. 7 is a view for explaining an overlap of audio presentation, and 
showsamodel of video presentation units andaudio presentation units in TSl 

andTS2; 

no. 8 is a block diagram showing an information processing device 
according to the embodiment of the present invention; 

HG. 9 is a timing chart of input, decoding, and presentation of the 
transport packet during the shiflbetweenacertainAVstreamCrSDand the next 

AV stream (TS2) seamlessly connected to the AV stream (TSl); 

FIG. 10isanotherexampleoftimingchartofinput,decoding,and 
presentationofthetransport packet during the shiftbetweenacertainAVstream 
Crsi) and the next AV stream CrS2) seamlessly connected to the AV stream 

(TSl); 

ma. 11 is another example of timing chart of input, decoding, and 
.3 presentationofthetransportpacketduringtheshiftbetweenacertainAVstream 
(TSl) and the next AV stream CTSZ) seamlessly connected to the AV stream 
(TSl); 

HG. 12 shows a data format of attachment information Cliplnfo ( ) for 

storing ATC_delta; 

FIG. 13 is a view showing a model of attachment information Cliplnfo 0 
in thecasewhereapluraUty of AV streams CrS2s) to be connected to a certain 



AV stream (TSl); 

HGS. 14(a) and 14(b) are graphs showing examples of changes in the bit 
occupation amount of videobuffer and audio bufferoftheDVR^TD during me 

shift between TSl and TS2com,ected seamlessly to TSl in the case where the 
size of audio buffer is 4 Icbytes, in the conventional DVR-STD; and 

HGS. 15(a) and 15(b) are views for explaining an advantage in the 
embodiment of the present invention, and are graphs showing examples of 
changes in the bit occupation amountofvideo and audio buffers of theDVR- 

STO during the shift between TS 1 and TS2 that is seamlessly connected to TSl 
in the case where the size of audio buffer is 8 Idjytes. 

Best Mode for Carrying Out the Invention 

An embodimentof the present invention will be described in detail below 

with reference to the accompanying drawings. This embodiment is obtained by 
5 applying the present invention to an information processing device that 

continuously reproduces, in a seamless mamter, two multiplexed AV streams 
consisting of video and audio s,^. Tlte present embodiment wiU propose 
an audio buffer that has the optimum capacity avaUable at reproduction of the 
two AV streams that are seamlessly connected to each other provided in a DVR- 
2 0 STD (Digital Video Recording-System Target Decoder). 

Firstly, the terms used in the following description are defined. "Clip" 



de„o.esamultiplexeds«eamofvideoandaudiosB.ams. "PlayLisf denotes a 
group of reproduction zones in fte Clip. One reproduction zone in some Clip is 
called "Playltem", which is represented by a pair of IN point and OUT point on a 
time axis. That is, PlayList is a group of Playltem. 

"Reproducing Playltems in a seamless manner" denotes a state where a 
reproducing device (player)presents(reproduces)audio/video data recorded ina 

disc whUepreventingagap or pause in .he reproduction outputofadecoder. 

A structure in which .he two Playltems are seamlessly connected to each 
ote will be described. Whetiter seamless presentation of a previous Playltem 
andacurrent Playltem is assured or not ca„bejudgedbyconnection_conditio„ 

field defined in the cutrent Playltem. As ti,e method for realizing seamless 
connection between Playltems, a metod (f,rst method) using Bridge-CUp 
(BridgeSequence), and another method (second me*od) that does not use 
Bridge-Clip are available. 

Firstly, TSl and TS2 in ttie case where *e previous Playltem and &e 
current Playltem are connected using BridgeSequence (first method) will be 
described. HG. 2is a view showing a model of the relation between a previous 
Playltem (PIl) and a current Playltem (PI2) in the case of using Bridge<aip 
(firstmefltod). In HG. 2, the stfeam data to be read out by a player in the case 
0 ofusingBridge-Clipareshowninashadedmanner. A DVR MPEG (Moving 
Picmre Experts Group) 2 transport stream CTS) consists of an integer number of 



Alignedumts. AUgned uni. has 6.144 by.es (2048 X 3 bytes). OneAUgned 
u„Uincludes32sourcepacke,s and su^tsinftefirstbyteof resource packet. 

Eachsourcepackethasal92by.ele„g*. One source packet consists of 
TP_extta.headerhavinga4byte length andatransport packet havingal88 byte 

length. TP_extra_header has copy_premission_indicator and airival. 
time.stamp. Copy_premission_indica,or is an integer indicating copy 
resmctionotPayloadofthetransportpacket. Arrival_time_stan.p (ATS) isa 
ti„,estan.p indicating the time when the corresponding transportpacketinAV 
.tteamreachesadecoder. Tire time axis based on arrival_time_stan.p of each 
sourcepacketconsti^atinganAVstreamisreterred to as arrival timebase, and 

its clock is called ATC (Arrival Time aock). 

TSl (fTSt multiplexed stream) shown in HG. 2 consists of stream data 
Dl, which is a shaded part of Clip 1 (Clip AV stream), and stream data D2. 
which isapart before SPN_arrival_time_disconUnuity in the shaded Bridge^lip. 

5 SPN arrivaltoe .discontinuity indicates an address of the source packet at 
which discontinuity of the arrival timebase exists in theBridge<;iipAVs.ream 

file. 

The stream dataDl in TSl corresponding to a shaded part of CUp 1 is 
streamdamfromthe address of the streamneededfor decoding thepresentation 
.0 unitcorrespondingtoIN_timeaN_timelinnG.2)ofapreviousPlayItemto 
the source packet referred to by SPN_exit_ftomj>revious_aip. 



The stream data D2 in TSl, which is a part before 
SPN_airival_time_discontinuity of shaded Bridge<:iip, is stream data from the 
f^t source packet in the Bridge^lip to the source packet immediately before 
the source packet referred to by SPN_arrival_time_discontinuity. 

TS2 (second multiplexed stream) shown in nG.2 consists of a stream 
data D4, which is a shaded part of Clip 2 (Clip AV stream), and a stream data 
D3, which is a part after SPN_arrival_time_discontinuity in the shaded Bridge- 
Clip. 

The stream data D3 in TS2, which is a part after 
SPN_arrival_time_discontinuity in the shaded Bridge<aip, is stream data from 
the source packet referred to by SPN_arrival_time_discontinuity to the last 
source packet of Bridge-Clip. 

The stream data D4 in TS2, which is a shaded part of Clip 2 is stream 
data from the source packet referred to by SPN_enter_to_current_Clip to the 
addressof the sfream needed for decoding the presentation mut corresponding to 

OUT.time (OUT_time 2 in HG. 2) of a current Playltem. 

Next, TSl and TS2 in the case where the previous Playltem and the 
current Playltem are comiected to each other by the second method that does not 
use BridgeSequence will be described. HQ. 3 is a view showing a model of 
0 ,herelationbetweenthepreviousPlayItem(PIl)andthecurrentPlayItem(PI2) 

comtected to each other by the second meftod that does not use BridgeSequence. 



In nG.3, the stream data to be read out by a player are shown in a shaded 



manner. 



TSl (first multiplexed stream) shown in FIG. 3 includes stream data D5, 
which is a shaded part of Clip 1 (Clip AV stream). The stream data D5 in TSl, 
which is a shaded part of Clip 1, is data from the address of the stream needed 
for decoding the presentation unit corresponding to IN_time (IN_tune 1 in FIG. 
3) of the previous Playltem to the last source packet of CUp 1. 

TS2 (second multiplexed stream) shown inHG. 3 consists of stream data 
D6. which is a shaded part of CUp 2 (Clip AV stream). The stream data D6 in 
TS2, which is a shaded part of Clip 2, is stream data from the first source packet 
of Clip 2 to the address of the stteam needed for decoding the presentation unit 
corresponding to OUT_time (OUT.time 2 in FIG. 3) of the current Playltem. 

In HGS; 2 and 3, TSl and TS2 constiwte a data stream in which source 
packets are consecutive. Next, a stream regulation of TSl and TS2, and a 
connection condition between them will be described. 

While TS 1 and TS2 are obtained by multiplexmg video and audio 
streams, firstly a resttiction on a video bit stream in an encoding resfriction for 
seamless connection wiU be here described. 

HO. 4 is a view showing, in the presentation order of pictures, models of 
0 Clip 1 and Oip 2, which are video streams to be seamlessly comtected to each 
other. When a video picture program is reproduced with a part thereof being 



skipped, it is necessary .operfcHnre^ncodingprocessing of video stteamin 
adecoding device in order to seamlessly connect an out-pointsideprogram, 
which is positionedbetore an out-point picture serving asastartingpointof skip 
action and an in^intside program, which is positioned after an iuHime 

picture serving as a reaching point of skip reproduction. 

A GOP (group of pictures), a unit of pictures according to MPEG 
specification includes three types of encoded images: one or more I Ontra) 
pictures (intra-ftame encoded image) each of which are reference images in 
which an image has been encoded without predictive encoding from other 
0 picture,P(predictive)picUKeseachofwhichisaforwarddirecdonpredictive 
encodedimageobtainedbyencodinganimageusingpredictiveencodinginthe 
samedirectionas,hepresentationorder.andB(bidirectionally)pictureseachof 
which isabidirectionallypredictive encoded imageobtained by using predictive 
encodinginboththeforwardandreversedirecUons. In FIG. 4, each number 

typesofthepicmre. nG.4shows.hecasewhereB7ofCUp 1 andb4of CUp2 
^reconnected. In order to seamlessly present .he video stream at the 
conne^onpoin^unnecessarypicmres positioned afterOUT.timeKOUT.time 

of Oip 1) and before IN.time 2 (IN_time of Oip 2) must be removed by the 
30 processthatencodespartialstreamofOipinthevicinityoftheconnectionpoint. 

HGS. 5(a), 5(b) and 5(c) show an example in which the video streams 



(CUp 1 and Clip 2) shown in HG. 4 are seamlessly connected using 
BridgeSequence (first method) . The video stream of Bridge^Up before 
SPN_arrival_time_discontinuity consists of the encoded video stream including 
picmresbeforeOUT.time 1 ofClip 1 shown in FIG. 4. After the video stream 
S isconnectedtothepreviousvideostreamofClipl.thetwovideostreamsarere- 

encoded to be one continuous elementary stream conforming to MPEG2 

specification. Similarly, the video stream of Bridge<3ip after 
SPN_arrival_time_discontinuity consists of the video stream including pictures 

after IN_time 2 of CUp 2 shown in HG. 4 . the video stream, decoding of 
. 0 which can be started p«>perly. is connected to the subsequent video stream of 
cup 2. After that, the comiected two video streams are re^ncoded to be one 
continuous elementary stream conforming to MPEG specification. In order to 
create Bridge^Up, several pictures must be re^ncoded, and other pictures can 
be obtained by copying the original Clip. 
IS HG. 5(a) represents CUp 1 shown in HG. 4 m the presentation order 

thereof. The player jumps from the source packet number 
(SPN_exit_fronu-evious_Clip)ofP5 of the previous cup 1 to Bridge-CUp 
shown in HG. 5(b). In D2 of Bridge<:Up shown in FIG. 2, that is, in the 
stream data on OUT_tune 1 side of CUp 1 corresponding to the video data 
20 beforeSPN_arrival_time_discontinuityinBridgeSequence,datadl including 

pictures up to B4 consists of the data obtained by copying cup 1 without change, 



and data d2, which consists of pictures B6 and B7 of the original CUp 1 in a 
nonnal situation, includes P7 and B6 obtained by decoding CUp 1 to non- 
compressed image data and re^ncoding them. Further, also in D3 of Bridge- 
CUp shown in HG. 2. that is, in the stream data on IN.time 2 side of Clip 2 
corresponding to the video data after SPN_arrivaLtime_di^ntinuity in 
BridgeSequence, pictures b4, p5, p8, b6, b7. of the original Clip 2 become newly 
created data (iO, pi, p4, b2, b3) d3 obtained by once decoding Clip 2 to non- 
compressed image data and re^ncoding them. Datad4before3umpingto 
SPN_enter_to_current_CUp is obtained by copying Chp 2 without change. 

no. 6 shows an example in which the video streams (Clip 1 and CUp 2) 
shown in no. 4 are seamlessly connected by using the second method that does 
not use BridgeSequence. In CUp 1 and Clip 2 shown in HG. 6. pictures are 
arranged in the presentation order. Even in *e case where Bridge Sequence is 
not used, the stream in the vicinity of the connection point is once decoded to 
non^mpressed data and re-decoded to the optimum picture type, as in the same 
waywhereBridgeSequenceisusedasshownmnG.5. That is, the video 
stream of CUp 1 includes the encoded video stream up to the picture 
cotrespondmg to OUT.time 1 shown in HG. 4, wherein B6, B7 of the original 
Clip 1 are re^ncoded to data (P7, B6) d5 so as to be one continuous elementary 
0 stream conforming to MPEG2 specification. Similarly, the video stream of 
dip 2 includes the encoded video stream after picture corresponding to IN.time 



2 of aip 2 shown in FIG. 4. wherein b4, p5, p8, b6, b7 of the original Clip 2 aie 
re-encoded to data (iO. pi, p4, b2, b3) d6 so as to be one continuous elementary 
stream conforming to MPEG2 specification. 

Next, the encoding restriction of multiplexed streams of TSl andTS2 
will be described. HG. 7 is a view for explaining an overlap of audio 
presentation, and shows a model of video presentation units VPUl and VPU2 
and audio presentation units APUl and APU2 in TSl and TS2. 

As shown in HO. 7, the last audio frame A_end of the audio stream of 
TSl includes an audio sample having the presentation time equal to the 
presentation end time (OUT.timel) of the last presentation picture of TSl. 
The first audio frame A_start of the audio stream of TS2 includes an audio 
sample having the presentation time equal to the presentation start time 
(IN_time2) of the first presentation picture of TS2. Thus, no gap exists in the 
sequence of audio presentation units at the comrection point between TSl and 
i TS2, and an audio overlap defined by the length of audio presentation unit less 
than 2 audio frames is generated. TS at the comiection point is DVR MPEG2 
TS according to DVR-^TD (Digital VideoRecording^ystem Target Decoder) to 

be described later. 

DVR-STD is a conceptaal model for modeling decode processing in 
: 0 generating and examining the AV stream referred to by two Playltems that have 
been comtected seamlessly to each other. HG. 8 shows the DVR-STD model 



(DVR MPEG2 transport stream player model). 

As shown in HG. 8, an information processing device (DVR MPEG2 
transport stream player model, which is hereinafter referred to as "player") 1 
according to the embodiment includes an output section 10 that reads out 
transport packets from TS comiected for seamless reproduction and outputs them, 
and a decoder (DVR-STD) 20 that decodes transport packets from the output 
section 10. As described later, the decoder 20 has been obtained by changing 
the input timing of transport packets and capacity of audio buffer in the 
aforementioned conventional DVR-STD. In the output section 10, the TS file 
read out atareadout rate RuDfromareadout section (DVRdrive) 11 is buffered 
in a read buffer 12. From the read buffer 12, a source packet is read out into a 
sourcedepacketizerl3atabitrateRMAX. Rmax is a bit rate of a source packet 



stream. 



A pulse oscillator (27 MHz X-tal) 14 generates a 27 MHz pulse. An 
arrival time clock counter 15 is a binary counter that counts the 27 MHz 
frequency pulse, and supplies the source depacketizer 13 with 
Arrival_time_clock (i) which is a count value of Arrival time clock counter at 
time t (i). 

As described above, one source packet includes one transport packet and 
its arrival_time_stamp. When arrival_time_stamp of the cmxent source packet 
is equal to the value of LSB 30 bit of arrival.time.clock (i), a transport packet of 



the current source packet is output from the source depacketizer 13. 
TS_recordmg_rate is abitrate of TS. 

Notations of n. TBn, MBn, EBn, TBsys, Bsys, Rxn, Rbxn, Rxsys, Dn, 
Dsys, On, and Pn(k) shown in HG. 8 are the same as those defined in T-STD 
5 transport stream system target decoder specified by ISO/IEC 13818-1) of 
1SO/IEC13818-1 (MPEG2 systems specification). That is, as follows, 
n; index number of elementary stream 
TBn; transport buffer of elementary stream n 

MBn (exists only in video stream) : multiplexing buffer of elementary stream n 
xo EBn: elementary stream buffer of elementary stream n, which exists only in 
video stream 

TBsys: input buffer for system information of the program that is being decoded 
Bsys: main buffer in system target decoder for system information of the 
program that is being decoded 
15 Rxn: transfer rate at which data is removed from TBn 

Rbxn (exists only in video sfream): fransfer rate at which PES packet payload is 
removed from MBn 

Rxsys: transfer rate at which data is removed from TBsys 

Dn: decoder of elementary stream n 
20 Dsys:decoderrelatedtosysteminformationoftheprogramthatisbeing 

decoded 



On: re-ordering buffer of video stream n 

Pn (k): k-tti presentation unit of elementary stream n 

Next, decoding process of the decoder 20 wiU be described. Firstly, the 
decoding process during reproduction ofasingleDVRMPEGlTS will be 

described. 

During reproductionofasingleDVRMPEG2TS, the timing a, which 
d.e.ansport packet is input to thebufferTBl.TBn or TBsys is determinedby 

arrivaLtime_stamp of the source packet. 

Buffering operations of TBI, MBl, EBl, TBn, Bn, TBsys and Bsys are 
specifiedinasimilar manner as in theT-STDspecifiedbylSO/mC 13818-1. 

Decoding and presentation operations thereof are also specified in a similar 
„«mer in the T^TO specified by ISO/lEC 13818-1 . 

Next, decoding process during reproduction of Playltems that are 
seamlessly connected will be described. FIG. 9 is a timing chart of input, 
s decoding, and presentation of the transport packet during the shiflbetween a 
certainAVstreamdSDand the nextAVstreamCrsa) seamlessly collected to 

theAVstteamCTSl). 

Here, a description is given of two AV streams that are referred to by 
seamlessly connected Playltem. In the later description, reproduction of TSl 
,0 andTS2.ha.havebeenseamlesslyconnected,showninFIG.2ornG.3,wUlbe 
described. ■n.atis.TSl is the previous stream, and TS2 is the current stream. 
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Respective packet, sectioned by TSl and TS2 represent source packets SPl and 

SP2ofTSlandTS2. 

During the shift between a certain AV stream (TSl) and the next AV 
stream (TS2) seamlessly connected to the AV stream (TSl), the time axis (ATC 
2 in FIG. 9) of TS2 arrival time base is not the same as that (ATC 1 in FIG. 9) of 
TSl arrival ume base. Further, the time axis (STC2 in FIG. 9) of TS2 system 
time base is not the same as that (STCl in HG. 9) of TSl system time base. 
The presentation of video images needs to be seamless. An overlap may exist 
in the presentation time of audio presentation unit. 

In the play« 1 of the present embodiment, the audio buffer having the 
optimum capacity is obtainedby changing the foUowing two points wift respect 
to the aforementioned player 101 discribed in Jpn. Pat. Appta. Laid^en 
Publication No. 2000-175152, Jpn. Pat. Appln. Laid^pen Publication No. 
2001-5441 18, and Jpn. Pat. Appln. Laid-Open Publication No. 2002-158974. 
Thedescriptionbeginswiththef.rstchangingpoint. The first changing point is 
that input of the packets of TS 1 up to the last packet into the decoder 20 is 
determined by arrival_time_stamp of their source packets during the shift 
between a certain AV stream CTS 1) and the next AV stream CrS2) seamlessly 
connected to the AV stream (TSl). 

That is, as described above, in the conventional player 10 1 , the transport 
packet is input to the buffer at the maximum bit rate of TS with 



arrival_tm«.su«i>p ignored between time Tl when &e last video packet of TS 1 
has been input to TBI and tin«T2 when the input ofthe last byte of TSl has 
been completed, whereas in the present en*odiment, input of the source packet 
between Tl andT2 is determined by arrivaUime.stamp of Resource packets of 
5 TSl as witi. the case of before time Tl. This eUminates the additional buffer 
corresponding to 1 second ti,a,has been conventionaUy required for inputting 
the transport packet at the maximum bit rate IW of TS with amval_time_stamp 
of the source packet ignored. 

The input timing to the decoder 20 in this case wiU be described wiflr 

1 0 reference to FIG. 9. 

(1) Before time Tl 

Before time Tl, that is, until the input of the last video packet of TSl to 
the decoder 20 has been completed, the input timing to the buffer TB 1 , TBn or 
TBsys of the decoder 20 is determined by arrival_time_stamp of the source 
15 packet SPl of TSl. 

(2) From time Tl to time T2 

The input timing of the remaining packets of TSl to the decoder 20 is 
also determinedby arrival_time_stamp of the source packet SPl of TSl. The 
Ume when the last byte of TS 1 is input to the buffer is time T2. 
20 (3) After time T2 

At time T2, the arrival time clock counter 15 is reset to the value of 



arrival_time.sm>pofthefirstsourcepaclcetofTS2. -fte input tinring to fl>e 
buffer TBI, TBn or TBsys of the decoder 20 is determined by 
airival_time_stamp of the source packet SP2 of TS2. 

That is, the input timing to the buffer TBI, TBn or TBsys of the decoder 
20isdeterminedbya,rival_time_stampof*esourcepacketSPlofTSlbefore 

time T2 at which me inputofthelastbyteofTSl to thedecoder20has been 
completed, and determined by arrival_time_stamp of the source packet SP2 of 
TS2 after time T2. 

Next,videopresen«>tiontimingwmbedescribed. A video presen«.tion 
unit mustbepresented seamlessly throughthe aforementioned connection points 

as shown in FIGS. 2 and 3. That is, the last video data (first picmre) of TS 1 
and the first video data (second picture) of TS2 are reproduced seamlessly. It is 
assumed that 

STCl: time axis of TSl system time base 
3 STC2:timeaxisofTS2systemtimebase(correctly,STC2startsftom.hetime 

when the first PCR (Program Clock Reference)ofTS2 has been input to T- 
STD). 

An offset value between STCl and STC2 is determined as foUows. 



20 

ofTSl 



Assuming that 

PTSl^: m on STCl corresponding to the last video presentation unit 



FTSW FTS on STC2 corresponding to the first video presentation unit 
ofTS2 

Tpp: presentation period of the last video presentation unit of TSl, 
Of&et value STC_delta between two system time bases is represented by 
5 the following equation (6). 

STC_delta - PTSle„, + Tpp - PTSa^n • • • (6) 

Next, audio presentation timing will be described. An overlap of the 
.„ presentation timingoftheaudiopresentationumtmay exist at the connection 

point of TSl and TS2. the overlap being fromOto less fl,an 2 audio frames (refer 
to audio overlap mHG. 9). The player 101 must select one of the audio 
samples and re^ynchronize the presentation of the audio presentation unit with 
the corrected time base after the connection point. 

nie processing for control of system time clock of the decoder 20 
carried out by the player when the time shifts from TS 1 to TS2 seamlessly 

connected to TSl will be described. 

AttimeT5,thelastaudiopresentationunitofTSl ispresented. The 

systemtimeclocksmaybeoverlappedbetweentimeTlandTS. Betweentime 
20 T2 and T5, the decoder 20 switches the system time clock from the value 
(STCl) of the old time base to the value (STC2) of the new time base. The 
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value of STC2 can be represented by the foUowing equation (7). 
STC2 = STCl - STC.delta • • • (7) 

5 A coding condition that TSl and TS2 must meet when the time shifts 

from TSl to TS2 seamlessly connected to TSl will be described. 
It is assumed that 

STCl'™, : value of STC on system time base STCl when the last byte of 
the last packet ofTSl has reached the decoder 20 

STC2^^: value of STC on system time base STC2 when the first byte of 
the first paclcet of TS2 has reached the decoder 20 

STC2'™.: value obtained by convertmg the value of STCl'^ to the value 
on system time base STC2. 

In this case, STC2^„d is represented by the following equation (8). 

15 

STC2^„d = STCl^nd - STC_delta • • • (8) 

It is necessary to meet the following two conditions in order for the 

decoder 20 to conform to the DVR-STD. 

20 (Condition 1) 

THe timing at which the first packet of TS2 reaches decoder 20 must meet 



the following inequality (9). 



STC2^»a„>STC2'™, - W 

The partial streams of Clip 1 and/or Clip 2 need to be re^ncoded and/or 
re-multiplexed in order to meet the above inequality (9). 
(Condition 2) 

On the time axis of the system time base obtained by converting STCl 
and STC2 to .he same time axis as each other, inputs of the video packet ftom 
.0 TSlandsubsequentvideopacketfromTS2shouldnotoverflowandunderflow 

the video buffer. Further, on the thne axis of the system time base obtained by 
converting STCl and STC2 to the same time axis as each other, inputs of the 
packet from TS 1 and subsequent packet from TS2 should not overflow and 
underflow all the buffer in the decoder 20. 
15 HG. 10 is another example of timing chart of input, decoding, and 

presentation of the transport packet during the shift betweenacertainAVstream 

CrSDand the nextAV stream CrS2) seamlessly comiected to theAVstream 
(TS 1). Also in this case, the input timmg of packets up to the last packet of 
TSl to the decoder 20 is determined by arrival_time_stamp of their source 
20 packets. Onepointthatdifrersfrom,hetimingchartshowninnG.9is.hata 
predetermined time interval (dellal: interval between time T2 and T2-) is 



provided, as shown in HG. 10, so as to eliminate the need to input the first 
packet of TS2 immediately after the last packet of TSl. As a result, 
determination of the input timing of the first packet of TS2 becomes more 
flexible than in the case of FIG. 9, which makes it easy to encode TS2. 

The input timing to the decoder 20 in this case will be described with 
reference to FIG. 10. 

(1) Before time T2 
BeforetimeT2,thatis,untiltheinputofthelastbyteofthelastpacketof 

TSl to the decoder 20 has been completed, the input timing to the buffer TBI, 
10 TBn or TBsys of the decoder 20 is determined by arrival_time_stamp of the 
source packet SPl of TS 1 . 
(2) After time T2' 

When the time has reached time T2' through the time T2 and deltal , the 
arrival timeclock counter 15 isresettothevalueofarrival_time_stampof.he 
,5 f^tsourcepacketofTSZ The input timing to the buffer TBI, TBn or TBsys 
of the decoder 20 is determinedby arrival_time_stamp of the source packet SP2 
ofTS2. 

In the case where the deltal is provided as shown in HG. 10, the 
aforementioned STC2^«„ and STC2'e. must meet the foUowing relational 
20 expression (10). 
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STC2^s», > STC2'^ + deltal • • ■ (10) 

HG. 11 is another example of timing chart of input, decoding, and 
p^ntation of the transport packet during me shift betweenacertainAVstream 

5 (TSl) and the next AV stream CrS2) seamlessly connected to the AV stream 
Crs 1). Also in this case, the input timing of packets up to the last packet of 
TS 1 to the decoder 20 is determined by arrival_time_stamp of their source 
packets. OnepointthatdiffersfromthetimingchartshowninHG. lOisthata 
p^terminedtime interval (ATC_delm: interval between time T2 and T2') is 
,0 provided, as shown in HG. 11. As a result, determination of the input timing 
of the first packet of TS2 becomes more flexible than in the case of HG. 9, 
which makes it easy to encode TS2. 

The input timing to ti.e decoder 20 m this case will be described with 

reference to FIG. 11. 

15 (1) Before time T2 

Before time T2, that is. until the input of the last byte of the last packet of 

TSl to fl>e decoder 20 has been completed, tire input timing to the buffer TBI. 
TBn or TBsys of the decoder 20 is determined by arrival_time_stamp of the 
sourcepacketSPlofTSl. 

20 (2) From time T2 to time T2' 

Time T2- is the time at which ttie first packetofTS2 is input to the 



decoderlO. ATC_deto is an offset time ftom arrivaLtime_s,amp (time on 
ATCl) of ti,e las. packet of TSl to time T2' that has been projected on the 
ATCl. 

(3) After time T2' 

A. time T2-, the arrival timeclock counter 15 is reset to the value of 
arrival_ame_stampof.hefetsoureepacketofTS2. The ,nput timing to .he 
buffer TBI, TBn, or TBsys of *e decoder 20 is de.ermined by 
arrivaUime_sUmp of *e source packe. SP2 of TS2. 

The value of ATC_deto is so deiermined as .o meet STC_deto of me 

above equation (6). 

The value of ATC_delm is managed as attachmen. information of stream 
dau. When TSl and TS2 are conneced seamlessly .o each oflter as shown in 
HG. 11, *e value of ATC_delm is managed as attachment information of TSl. 

HG. 12 shows a data format of attachment information Cliplnfo ( ) for 

5 Storing ATC_delta. 

in HG. 12. is_ATC_delm is a flag indicaiing wheto Cliplnfo ( ) has *e 
valueofATCdelm. A pluraliiy of values can be regisfered in Cliplnfo () for 
al,owingapluralityofTS2stobeconnectedtoTSl asshowninHG.lS. When 
is_ATC_delta flag is 1, number_of_ATC_delta_entries denotes the number of 
2 0 ATC_deltas that have been registered in Cliplnfo ( ). 

Furflter, in HG. 12, following_Clip_Infromation_me_name is a name of 



asttean.otTS2tobeconnec.ed«,TSl. When*ereexistapluraUtyotTS2s 
to, correspond to following.CUp_In6omation_file_name, *e values of 
ATC_deto ftat correspond to respective TS2s are registered in Cliplnfo ( ). 

When TSl and TS2 are input to the DVR-STD model in HG. 8, 
multiplexed streams thereof and their attachment information Cliplnfo ( ) are 
input. Cliplnfo ( ) includes information of the aforementioned ATC_delta, 
which is handled wim a predetermined method by a controUer (not shown in 
HG. 8) of theDVR^TD model in .he shift between TSl and TS2. as described 



above. 



The second changing pomt is that the size of the decoder 20 is changed to 
tosizelargeenoughtomeetthefollowingcondition. Tt.e condition is *at the 
picwredpicturoto be decoded f^t in TS2 can be input to the video bufferby 

tt>e decode timing thereof after the input of the last transport packet of TSl has 
been completedduring the shift betweenTSlandTS2 seamlessly comtected to 

the TSl. 

Tl.e maximum value of the capacity of the audio buffer needed to meet 
fte above condition is as follows. That is, the size capable of storing the audio 
data amount having a length corresponding to "time large enough to input the 
^mumbi, amount oflpicture to the video buffer by the decode timing 
0 thereof is required. The maximum value EBn_max of me requirement of the 
audio buffer can be represented by the following equation (1 1). 



EBn_inax = a_max/Rv) * Ra [bits] ■ • • (1 D 



Where I.max is a maximum bit amount of I picture, which corresponds 
to the size of the video code buffer EBl shown in HG. 8, Rv is an input bit rate 
.0 the video code buffer EBl, and Ra is abitrateof an audio stream. As shown 
in the above equation (U), the sizeEBn.maxof the audio buffer to be 
calculated is the value obtained by multiplying the time required for making the 

buffer occupation amount of the video codebufferEBlincreasefromOto 
I_max at an input bit rate to the video elementary stream buffer (EBl) by Ra. 

AS a concrete value, the buffer size that can store the audio data 
corresponding toatleast 100 milliseconds is recommended. Ihereasonis as 
follows. Thatis,whenlpicu.reisencodedevery0.5 seconds, the bit size of I 
pictureisgenerallyioroorlessofflteencodingbitrate. Assuming that the 
encoding bit rate is. for example, 10 Mbps, the size of I picture is 1 Mbits or less 
in general. 

Thus, as the first reason, with at least 100 milliseconds, I picture can be 
input t» the video buffer of the decoder 20 by the decode timing thereof. 

Furfl^er, as the second reason, if the audio buffer of the decoder 20 can store the 
0 audio data corresponding to 1(K) mmiseconds, it is possible to multiplex TSl so 
ftat input of the audio data to the audio buffer is complete 100 miUiseconds 



earlier than reproduction timing of the audio data, "n-erefore, when the audio 
buffer has the buffer size that can store at least audio data corresponding to 100 
milUseconds, it is possible to assure, for the above first and second reasons, at 
least 100 miUiseconds as the time until which the input of the picture a picture) 
to be decoded fust in TS2 to the video buffer has been completed after the input 
of the last transport packet of TS 1 had been completed during the shift between 
TS 1 and TS2 seamlessly connected to the TS 1 . 

The capacity of the audio buffer that can store the audio data 
corresponding to 100 milUseconds is concretely calculated below. 

In the case of Dolby AC3 audio stream of 640 kbps; 640 kbps XO.l sec = 

64 kbits = 8 kbytes 

In the case of Linear PCM audio stream (24 bit sample, 96 KHz sampling 
frequency, 8 cham>els): (24bitSample * 96,000 samples/sec * 8ch) X 0.1 sec - 
230,400 bytes 

5 The advantage obtained by changing the size of audio buffer of DVR- 

STD to "the size that can store the amount of the audio data corresponding to the 
time large enough to input the picture 0 picture) «> be decoded first in TS2 to the 
video bufl:er by the decode timing thereof like the decoder 20 of the present 
embodiment described above wiU be described in detail with reference to FIGS. 

20 14 and 15. 

Here, AC3 audio stream having a bit rate of 640kbps and a sampling 



frequency of 48 kHz will be explained as an example. The sample number of 
one audio frame of AC3 audio stream is 1.536. Accordingly, die time length of 
one audio frame is 32 milliseconds. The byte size of one audio frame is 2,560 
bytes. 

HGS. 14(a) and 14(b) are graphs showing examples of changes in the bit 
c^upationamountof video buffer and audio buffer of theDVR^TD during the 

shift between TSl and TS2comiected seamlessly to TSl in the case of audio 
buffer having 4 kbytes buffer size in the conventional DVR-STD. InFIGS. 
14(a) and 14(b). the dotted line denotes a buffer transition of video/audio data of 
TSl, and the soUd line denotes a buffer transition of video/audio daU of TS2. 

The audio buffer of 4 kbytes can store the audio data corresponding to 50 
nulhseconds. Accordingly, at STCl'„.o_e„a which is the time at which the last 
byte of the last audio packet of TSl reaches the DVR-STD. it is possible to 
multiplex TSl so that the input of the audio data is complete 50 milliseconds 
5 earlier than the reproduction timing thereof. However. 50 milliseconds are not 
enough to input the picmre a picture) to be decoded first in TS2 to the video 
buffer by the decoding timing thereof. In this case, encoding is so restricted as 
,0 lower the size of the pictureapicture) to be decoded first in TS2, which 

deteriorates the image quaUty. 
20 Since the audio buffer of 4 kbytes can move ahead audio data by 50 

milliseconds, startup delay tl shown in FIQ. 14(a) and 14(b), which is the time 



for inputting *efirstIpicwreofTS2w*evideob„ffer.becomes as small as up 
toSOnulliseconds. Therefore, it is impossible to take enough time to input .he 
f^tIpictureotTS2, lowering .helpicturesizeSl.wim the result mat image 
quaUtyoflpicn^eisdeterioratedduetorestrictiononencoding. Asdescribed 

above, it has been necessary to provide additional buffer corresponding to 1 
second in addition to4kbytes and to mputlpicture at the maximumrateRMAX 

otTSbetweenTl and T2 in order to increase the startup delay . Here, 
description is given of AC3 audio stteam having a bit rate of 640 kbps. 
However, as described above, addidonal buffer corresponding to 1 second is 

extremely large for multichannel LPCM audio. 

To cope with this problem, the audio buffer size of the DVR-STD is 

changed .o, for e«unple, 8kby.es like the decoder 20ofthe present embodiment. 

HGS. 15(a) and 15(b) each shows an example in wh,ch ttte capacity of audio 
buffer is optimized. More specifically, HGS. 15(a) and 15(b) show are graphs 
5 showingexamplesofchangesmthebitoccupationamountofvideoandaudio 
buffersoftheDVR^TC of the present embodiment during ttteshiftbetween 
TSlandTS2 .hat is seamlessly comtecttd to TSl in the case where the size of 
audio buffer is 8kby«s. InFIGS. 15(a) and 15(b), tire do,«d line shows a 
buffer transition of video/audio data of TSl, and the soHd line denotes abuffer 
20 transition of video/audio data of TS2. 

The audio buffer of 8kBy.es can store the audio data corresponding to 



lOOnuUiseconds. Aocordi„gly,attoeSTCl'^..».atwhichd,elastbyteof 
*e last audio packet ofTSlreachestt.eDVR^TD,Uis possible to multiplex 
TSl so that input of the audio data is completed 100 milUseconds earlier than the 

rep«>duc.iontiming.hereof. Withatleast 100 milliseconds, the picmre a 
picturOto be decoded first in TS2can be easily input to the videobuffer by the 

decode timing thereof. That is, it is possible to take enough time (startup 

delay) t2 to input the first 1 picture in TS2, which can mcrease size S2 of the 

picture (I picture) to be decoded first in TS2. Therefore, image quality of I 

picture can be increased due to lower encoding restriction. 

Further, intheplayer model 1 as showninHG. 8,TS, which consists of a 

data streamconstitutedbyaplurality of source packets eachhavingatransport 

packet and arrival time stamp and which is read outbyadecoder based on the 

a^val time stamp and decoded, can be regarded as being generated and 

recorded in a multiplexer (information processing device). 

Asdescribedwithreferenceto.forexample,nGS.4to6,,hemultiplexer 

includesavideo encoding section that generates re-encoded Clip Kfirst video 

encoding stream) to end the presentation withapredeterminedpicU.eandCUp2 

(second video encoding stream) which is tobe presented immediately after the 

picnare and which is re-encoded for starting presentation,amultiplexing section 

,0 that multiplexes Clip 1 and audio encoding stream syhchronized with Clip 1 to 
generate TSl and multiplexes Clip 2 and audio encoding stieam synchronized 



with Clip 2 to generate TS2, and arecording section thatiecords the multiplexed 
streamconsistsofTSl andTS2. In the multiplexing section, TSl andTS2are 
multiplexed such that input of the audio data to the decoder 20 can be completed 
by the time at which me input of the 1 picwre to the decoder 20 is started, the 
5 audio data corresponding to the time tor inputting I picture, that is, ti,e second 
picture to the video buffer of the decoder 20. Note that, as shown in FIG. 5, it 
is possible to generate Bridge^Up in ti,e encoding section and multiplex Bridge- 
Clip together with TSl and TS2 in the multiplexing section. 

Recoded on a recording medium configured to record a multiplexed 
xo stream generated by the above multiplexer, is ti>e multiplexed stream consists of 
TSl that ends with the first picture and TS2 that starts with the second picwre 
reproduced subsequent to me f^stpicture, in which TSlandTS2can be input to 

the decoder 20 based on their amval time stamp, and TSl and TS2 are 
multiplexed such that input of the audio data to ti.e decoder can be completed by 
X 5 the time at which the input of the second picture, that is, me first picture of TS2 
to me decoder 20 is started, me audio data corresponding to me time required for 

inputting fte second picture to me decoder 20. 

in me above configured present embodiment, whenTSl and TS2 mat are 

seamlessly connected to each omer are reproduced, input of me tiansport packet 

2 0 is performed according to meir arrival time stamp from me time at which me 

input of me last video packetofTSl to TBI ofme decoder 20 is completed even 



to the time at which the remaining packets of TSl are input to the decoder 20. 
and the size of the audio buffer is changed from 4 kBytes in the conventional 
DVR-STD to the size capable of storing data amount of audio data having a 
length corresponding to the time required for inputting the maximum bit amount 
of I picture to the video buffer by the decode timing thereof. As a result, it is 
possible to sufficiently assure the time (startup delay) required from the time at 
which the input of the last packet of TSl has been completed to the time at 
which the input of I picture, which is the first picture of TS2, is completed by the 
decode timing thereof. Therefore, image quality of I picture can be increased 
due to lower encoding restriction. 

Further, in the method using additional buffer, as is conventional, when 
audio data in TS is assumed to be, for example, multichannel LPCM audio data, 
additional buffer having the extremely large capacity is required. Whereas in 
the present embodiment, it is possible to eliminate the need of the additional 
; buffer that has been indispensable for the conventional method by changing the 
capacity of audio buffer as above and by inputting transport packets according to 

arrival time stamp. 

The present invention is not limited to the above embodiment described 
with reference to the accompanying drawings, and it will be apparent to those 
0 skilled in the art that various modifications, substitutions or the one equivalem to 
them can be made without departing from the claims appended hereto and the 
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spirit and scope of the invention. 



Industrial Applicability 

According to *e present invention described above, it is possible to edit 
S the multiplexed s«am consisting of video and audio strean. with video frame 
accuracy and reproduce it inaseamlessmanner.and to eliminate additional 

buffer corresponding to 1 second that has been requi^ for inputting the 
„rt packet at themaximumbit rate of TS with arrival_time_stamp of the 
sourcepacketignoredaoweringthebufferamountrequired for thedecoder more 
,„ ^anbefore. Further, it is possible to change Resize of .he audio buffer to *e 

size capable of buffering the audio dam having a length corresponding to me 
timerequired for inputdng the second picture to the video buffer, so that image 

quality ofthesecondpicmrecanbe increased due to lower encoding restriction. 



